Pre-processing Your Data


Pre-processing data by platform:

  • Illumina

Input


In [ ]:
forward = /Users/squiresrb/Documents/BCBB/Support/Gab\ Parra-Gonzalez/Para6-10_S2_L001_R1_001.fastq 
reverse = /Users/squiresrb/Documents/BCBB/Support/Gab\ Parra-Gonzalez/Para6-10_S2_L001_R2_001.fastq 
output_forward_paired = output_forward_paired.fq.gz 
output_forward_unpaired = output_forward_unpaired.fq.gz 
output_reverse_paired = output_reverse_paired.fq.gz 
output_reverse_unpaired = output_reverse_unpaired.fq.gz

Platform: Illumina

Trimmomatic

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

The current trimming steps are:

ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read. SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. LEADING: Cut bases off the start of a read, if below a threshold quality TRAILING: Cut bases off the end of a read, if below a threshold quality CROP: Cut the read to a specified length HEADCROP: Cut the specified number of bases from the start of the read MINLEN: Drop the read if it is below a specified length TOPHRED33: Convert quality scores to Phred-33 TOPHRED64: Convert quality scores to Phred-64 It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp'ed FASTQ. Use of gzip format is determined based on the .gz extension.

For single-ended data, one input and one output file are specified, plus the processing steps. For paired-end data, two input files are specified, and 4 output files, 2 for the 'paired' output where both reads survived the processing, and 2 for corresponding 'unpaired' output where a read survived, but the partner read did not.

The command line statement below will perform the following:

  • Remove adapters
  • Remove leading low quality or N bases (below quality 3)
  • Remove trailing low quality or N bases (below quality 3)
  • Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15
  • Drop reads below the 36 bases long

In [6]:
!java -jar /Applications/bioinfo/Trimmomatic-0.32/trimmomatic-0.32.jar PE -phred33 forward reverse output_forward_paired output_forward_unpaired output_reverse_paired output_reverse_unpaired ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36


TrimmomaticPE: Started with arguments: -phred33 forward reverse output_forward_paired output_forward_unpaired output_reverse_paired output_reverse_unpaired ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Multiple cores found: Using 8 threads
Oct 10, 2014 2:47:19 PM org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer makeIlluminaClippingTrimmer
SEVERE: null
java.io.FileNotFoundException: /Users/squiresrb/iPython-Notebooks/NGS/TruSeq3-PE.fa (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:120)
	at org.usadellab.trimmomatic.fasta.FastaParser.parse(FastaParser.java:54)
	at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.loadSequences(IlluminaClippingTrimmer.java:107)
	at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:70)
	at org.usadellab.trimmomatic.trim.TrimmerFactory.makeTrimmer(TrimmerFactory.java:27)
	at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:495)
	at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)
Exception in thread "main" java.io.FileNotFoundException: forward (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:120)
	at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)
	at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:251)
	at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:498)
	at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)

In [ ]: